Description

This displays the resulting filled images calculated using the fill_gaps.R script.

Different parameters were tested on the following data (note there are 2 different weeks, one with good weekly coverage and one without):

Region: Northwest Atlantic (NWA, 39 to 82 N, 42 to 95 W)  
Sensor: MODIS   
Resolution: 4km   
Processing level: Level 3, binned (L3b)  
Year: 2015  
Weeks: 7, 22  
Pixels outside 0-64 mg m^-3 removed  
Days with < 5% coverage removed  

ImputeEOF removes randomly sampled valid pixels for cross-validation. The number of pixels used is the maximum of 30, or 10% of the pixels. The function continues adding EOFs and calculating the resulting RMSE between real and reconstructed cross-validation pixels until the difference between the current RMSE and RMSE of the previous iteration is below a certain threshold (i.e. adding the most recent EOF did not significantly improve the RMSE). The threshold, called the “tolerance”, is different depending on whether you’re filling data in linear space or in log space, since a log RMSE will be only a fraction of the size of a linear RMSE:

Tolerance for filling logged data: 0.001
Tolerance for filling linear data: 0.01

We start by using a year of data to fill the gaps, and compare different methods below. Then, using the best options, we’ll try using a longer time series.

For each method of filling gaps, we’ll examine the following:

The linear regression uses the standard major axis method (SMA) from lmodel2::lmodel2(), since it minimizes the area of the triangle instead of the distance in the x or y direction alone (i.e. it assumes there is error in both the independent and dependent variables, the “real” and filled/reconstructed data).

Also note that for the tests that involve filling an 8day composite, in situ matchups should be interpreted with caution because of the long temporal bin and the changes that could occur in concentrations and patterns within that time span.

An analysis of DINEOF on the Canadian Pacific coast:
Hilborn A, Costa M. Applications of DINEOF to Satellite-Derived Chlorophyll-a from a Productive Coastal Region. Remote Sensing. 2018; 10(9):1449. https://doi.org/10.3390/rs10091449

8day vs daily

Chla algorithm: OCx
Logged/linear data: Logged

Which is better - filling the gaps in 8day data, or filling gaps in daily data and then averaging it into an 8day image?

Although some R^2 metrics appear better for the daily filled version, overall the 8day cross-validation data has a better fit and less bias (e.g. it identifies some patterns of higher concentration better than the daily fill).

8day

Number of EOF: 5 
 Total RMSE: 0.2224261 
 Week 7 RMSE: 0.287156 
 Week 22 RMSE: 0.1992027

Daily

Number of EOF: 11 
 Total RMSE: 0.2062114 
 Week 7 RMSE: 0.372335 
 Week 22 RMSE: 0.1860863

OCx vs POLY4

Temporal binning: 8day
Logged/linear data: Logged

Should the OCx or POLY4 algorithm be used? Note that POLY4 has shown to remove some of the bias in the NWA.
OCx = global band-ratio
POLY4 = regional band-ratio, tuned to NWA

The POLY4 algorithm does appear to remove some of the bias and improve the validity of the reconstructed values.

OCx

Number of EOF: 5 
 Total RMSE: 0.2224261 
 Week 7 RMSE: 0.287156 
 Week 22 RMSE: 0.1992027

POLY4

Number of EOF: 6 
 Total RMSE: 0.257584 
 Week 7 RMSE: 0.3356319 
 Week 22 RMSE: 0.2588712

Log vs linear

Temporal binning: 8day
Chla algorithm: POLY4

Should we use logged data or linear data to fill the gaps?
Note the process for the log option:

Logged data gives a smoother fill as it is not negatively impacted by isolated spikes over relatively low and consistent concentrations.

Log

Number of EOF: 6 
 Total RMSE: 0.257584 
 Week 7 RMSE: 0.3356319 
 Week 22 RMSE: 0.2588712

Linear

Number of EOF: 5 
 Total RMSE: 1.806879 
 Week 7 RMSE: 2.12901 
 Week 22 RMSE: 1.298033

Longer time series

If more satellite images are used in the algorithm, will it improve the results?

Hilborn and Costa (2018) found that pixel reconstruction improved with more data in a smaller region on the Canadian Pacific coast. Up until this point we have only used one year of data to fill the gaps, but here we’ll try adding more (an equal number of years on either side of the target year, 2015).

WARNING! Currently the 1year/3year/5year comparisons do not use the same CV pixels for 2015 (this will be updated soon for a more accurate comparison).

1 year

Number of EOF: 6 
 Total RMSE: 0.257584 
 Week 7 RMSE: 0.3356319 
 Week 22 RMSE: 0.2588712

3 years

Number of EOF: 11 
 Total RMSE: 0.2315096 
 Week 7 RMSE: 0.3038488 
 Week 22 RMSE: 0.2453207

5 years

Number of EOF: 13 
 Total RMSE: 0.2248574 
 Week 7 RMSE: 0.3209307 
 Week 22 RMSE: 0.2453435

Larger area

Here we’ll try adjusting the region used to fill the data.